The analysis of driving fatalities in the UK (1969-1984)
Introduction
I have decided to analyse the number of driving fatalities in the UK
from the year 1969 - 1984. We will look for evidence of seasonality,
general trends, outliers, and discuss reasons for the given results,
using Meta’s prophet system.
Let’s start by visualising the data.
The code below plots the raw data. From this we should be able to
assess the general trend and seasonality of our data set.

Noticeably, there is an overall decline after the mid 70’s. This
could be due to the introduction of road safety measures such as the “seat
belt law” which was enforced in 1983. It is also apparent that there
are repeating peaks for each year which suggests some seasonal pattern.
This could indicate higher deaths in certain months e.g. winter, summer.
These weather conditions consequently have an impact on road safety, for
example severe rainfall reducing the durability of car tyres. The
fluctuations in deaths seem to be a lot more prominent in the early
years (1969-1975), however it is apparent that the decline shows that
they seem to stabilize later on. Going back to the point of introduction
of the seat belt law, there is a noticible decrease in fatalities around
1983 which supports this. Other vital factors include the oil crisis
which took place in 1973 and 1979. Looking at the graph, there are
noticable dips around these periods.
Box Plot
Now we’re going to use the following code to produce
monthly box plot of the data set.
boxplot(UKDriverDeaths ~ cycle(UKDriverDeaths),
main = "Boxplot of UK Driver Deaths by Month",
xlab = "Month", ylab = "Number of Deaths",col ="pink")

Overall Trend: The blox plot reveals a clear
seasonal trend in UK driver deaths. It is evident that the number of
deaths seem to be lower in the earlier months of the year(January-April)
and higher in the later months (October-December)
Median: The medium number of deaths (the horizontal
line within each box) generally increases from January to December. The
months with the lowest medium deaths appear to be April (Box 4) with a
medium of around 1400 and the highest is December (Box 12) with the
medium being around 2200.
Spread: IQR: The interquartile range (IQR),
represented by the box height, varies across the months. The IGR is
narrow in earlier months, suggesting less variability in driver deaths
during that period. In contrast, the months like November and december
have wider boxes, indicating greater variability.
Whiskers: The whiskers show the typical range of
driver deaths for each month, extending to 1.5 time the IQR. Longer
whiskers, such as those in October and December, suggest a wider range
of potential death counts, possibly due to varying weather conditons or
increased holiday travel which was expected after assessing the basic
time series plot.
Outliers: It is clear that there are several
outliers (points beyond the whiskers). April (Box 4) has a few outliers
and October (Box 10), November (Box 11) and December (Box 12) have
higher outliers. These represent months with unsually low or high
numbers of driver deaths, potentially due to specific events such as
unusually good or bad weather, major holidays or policy changes.
Let’s summarise the overall seasonality trend and address potential
factors Poorer weather conditions during the months of autumn and winter
(October - December) such as rain, snow and ice can increase the risk of
accidents. Shorter daylight hours during these months contribute to
reduced visibility and higher accident rates. Along with this, increased
travel during the holiday season (particularly in December) may further
elevate the number of deaths.
Linear Regression Model
This linear regression plot shows the overall trend in UK Driver
Deaths over time.
The downward-sloping regression line suggests a decreasing trend in
driver deaths from 1969 to 1984. The data points are scattered but shows
a peak for each year, once again, highlighting the seasonality trend
that was shown in the box plot.
The decreasing trend may be due to vehicle safety standards, road
infrastructure and traffic regulations implemented during this
period.
The Road Safety Act 1967 made it an offence to drive a vehicle with
a blood alcohol concentration (BAC) in excess of 80mg of alcohol per
100ml of blood.
With this rule being enforced, it is likely that this, combined with
the The Seat Belt Law in 1983, resulted in the overall decline of road
accidents.
Breusch-Pagan Test
With a p-value > 0.05, we can conclude that there is no
significant heterodascity and I can confidently trust the results of the
linear regression model. The variance is not varying in any way that can
impact the reliability of my regression model. There isn’t a large
difference in how much the number of deaths fluctuates in any given year
or season.
Forecast Model
A forecasted model will allow us to predict whether these patterns
that we’ve established will persist in the future with the following
code:
Building the Forecasting model using Prophet.
UKDriver_model = prophet::prophet(UKDriverDeaths.df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
UKDriver_forecast = prophet::make_future_dataframe(UKDriver_model, periods=24, freq="quarter")
UKDriver_predict = predict(UKDriver_model, UKDriver_forecast)
plot(UKDriver_model, UKDriver_predict, main = "Forecast of UK Driver Deaths",
xlab = "Year", ylab = "Number of Deaths")

It’s worth mentioning that the prophet model dismissed any presence
of daily or weekly seasonality but didn’t deny any evidence of
monthly seasonality.
This generates a future data frame with 24 additional periods
(quarters) for forecasting. The freq=“quarter” argument specifies
quarterly intervals, which aligns with the dataset.
This plot visulises the number of forecasted number of UK drive
deaths from approximately 1969 to 1990 based on historical data from
January 1969 to December 1984.
The plots show the actual data plots (the black dots), the
forecasted values (the blue line), and the uncertainty intervals (shaded
blue area)
General Fit: The overall rend and seasonal patterns
that were previosuly discussed have been captured reasonably well.The
blue line seems to be a suitbale fit for the data, represented by the
black dots.
Downward Trend: The forecast continues the downward
trend that was highlighted in the linear regression model. This
solidifies that factors contributing to the decline in previous years
are likely to contribute in the future.
Seasonsality: Despite the data in the forecasted
plot being plotted yearly, upon zooming into the graph, you can still
identify the individual seasonal patterns throughout these years.
This aligns with the seasonal pattern obsereved in the box plot
analysis.
Uncertainty intervals: The uncertainty intervals
(shaded blue area) widen as the forecast tends further into the future.
This indicates the model is less confident in it’s predictions or more
distant time periods, which is expected due the general uncertainty in
forecasting.
Underestimation: It appears that the forecast may
be underestimating the data in later years (1980s). The blue line seems
to be consistently below some of the highest data points. This could be
due to the model not fully capturing the magnitude of he seasonal
fluctuations or the influence of specific events.
Bubble Graph
I will now construct a bubble graph to help visualise the seasonal
trends across across different months over the various years. This
allows for an easy comparison of how fatal accidents varied across time
while also highlighting the magnitude of fatalities with the following
code:
# Assuming you already have the UKDriverDeaths time series data loaded
data <- data.frame(
month = cycle(UKDriverDeaths), # Extract month information
year = floor(time(UKDriverDeaths)), # Extract year information
deaths = as.numeric(UKDriverDeaths) # Convert the deaths to a numeric vector
)
# You could also use other variables like the year or month as a factor in the bubble chart.
# For example, the bubble size can represent the deaths, and color can represent the year.
fig <- plot_ly(data, x = ~month, y = ~deaths, text = ~paste("Year: ", year),
type = 'scatter', mode = 'markers',
marker = list(size = ~deaths / 10, opacity = 0.6, color = ~year,
colorscale = 'Reds', showscale = TRUE))
# Customize layout
fig <- fig %>% layout(
title = 'Bubble Chart of UK Driver Deaths by Month',
xaxis = list(title = 'Month', tickmode = 'array', tickvals = 1:12,
ticktext = c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')),
yaxis = list(title = 'Number of Deaths'),
showlegend = FALSE
)
# Show the plot
fig
Higher deaths in early years: The largest bubbles
(indicating higher fatalities) are mostly in the earlier years (1970s),
with colours closer to light shades in the colour scale. This suggests
that deaths were significantly higher in the early years aand like
reduced over time. This is something we saw in the original raw lot that
showcased the general trend.
Seasonal Trends: November, December and january
tend to have larger bubbles, meaning higher numbers of driver deaths.
This could be due to worse driving conditions in winter (e.g, icy roads,
fog and shorter daylight hours) which was also verified via the Box
Plot.
Steady decline in deaths over time: As the colour
becomes darker towrds 1984, the bubbles tend to shrink, indicating a
reduction in fatalities. We pondered previously on the effect that
improved safety laws had on driving fatalities. This could also be
prevelant here. Once again, this downward trend is confirmed visually by
this bubble chart as expected.
Conclusion
The analysis of UK driver deaths revealed seasonal patterns with
peaks and troughs recurring annually. This seasonality was effectively
captured by the prophet model, demostrating it’s ability to handle time
series data with periodic fluctuations.
The overall downward trend in fatalities aligns with improvements in
road safety measures and vehicle technology over time.
Using prophet, a plot was forecasted for future periods, (24
quarters). Our forecasting predicitions that this downward trend is
likely to persist in the future and the prophet model’s flexibility
allows us to account for quarterly seasonality providing a realisitng
view of future trends.
Impact of external factors: Historical events like
the oil
crises in 1973 and 1979 may have influenced driving behaviour and
fatallities. While the model doesn’t explicitly account for external
factors, the observed dips during these periods lign with
expectations.
By leveraging Prophet’s features (e.g., make_future_dataframe and
predict), we were able to generate accurate forecasts and gain deeper
insights into the dataset.